Relationship between internet research data of oral neoplasms and public health programs in the European Union

Background Tobacco and alcohol are the main risk factors for oral squamous cell carcinoma, the low survival rate of which is a public health problem. European-wide health policies (a prevention campaign, tobacco packaging) have been put in place to inform the population of the risks associated with consumption. Due to the increase in smoking among women, the incidence of this disease remains high. The identification of internet research data on the population could help to measure the impact of and better position these preventive measures. The objective was to analyze a potential temporal association between public health programs and interest in oral cancers on the internet in the European Union (EU). Methods A search of data from Google ©, Wikipedia © and Twitter © users in 28 European countries relating to oral cancer between 2004 and 2019 was completed. Bibliometric analysis of press and scientific articles over the same period was also performed. The association between these data and the introduction of public health programs in Europe was studied. Results There was a temporal association between changes in tobacco packaging and a significant increase in internet searches for oral cancer in seven countries. Unlike national policies and ad campaigns, the European awareness program Make Sense has had no influence on internet research. There was an asymmetric correlation in internet searches between publications on oral cancer from scientific articles or "traditional" media (weak association) and those from internet media such as Twitter © or Wikipedia © (strong association). Conclusion Our work highlights seven areas around which oral cancer awareness in Europe could be refocused, such as a change in the communication of health warnings on cigarette packs, the establishment of a more explicit campaign name regarding oral cancer, the involvement of public figures and associations in initiatives to be organized at the local level and the strengthening of awareness of the dangers of tobacco in the development of oral cancer. Supplementary Information The online version contains supplementary material available at 10.1186/s12903-021-02022-z.

about the potential consequences of their consumption [1,2].
The increase in the consumption of tobacco among women and the evolution of sexual practices have led to an increase in oral and oropharynx cancers caused by human papillomavirus (HPV), respectively [3][4][5].
The low survival rate of oral cancers justifies effective prevention and screening.

Preventing and raising awareness of oral cancers among the population of the European Union
By limiting the consumption of tobacco and alcohol and raising public awareness of their dangers, the worldwide prevalence of the disease could be reduced by 75% [6].
Prevention is also an economic issue. In Europe, the average annual cost of an oral cancer patient in 2012 was between €20.000 and €23.000 [7].
In the EU, smokers have been informed of the risks of tobacco consumption since the introduction of directive 2014/40/EU. Since 2016, this has forced the 28 member states to follow rules on the manufacturing, presentation and sale of tobacco and its derived products.
Since 2013, the Make Sense Campaign (MSC) has been raising awareness and providing information to the European population about head and neck cancers. Organized by the European Head & Neck Society (EHNS), the MSC involved 18 countries in 2018. In parallel, certain European countries have organized their own national campaigns [8].

Evaluating the effects of health measures on the EU population
It would be tedious to survey a population as large as that of the EU to evaluate the large-scale impact of these measures.
Online research trends on the internet have been shown to reflect the changing trends of society over time. They show a marked increase in the number of internet searches during epidemics or for any other heightened interest in a disease [9,10].
This has been demonstrated since the start of the COVID-19 pandemic [11].
The analysis of data obtained by such internet research would be effective for the study of illnesses, with precision comparable to normal epidemiological methods [12][13][14].
In 2016, Ayers et al. raised the possibility of using big data to quickly and cheaply evaluate the effects of awareness campaigns. They studied the effects of national no smoking days in the United States (Great American Smokeout) and showed that it led to a significant increase in the number of internet searches [15].
To be effective, an awareness campaign (screening or diagnosis) must attract the interest of the public, particularly in oncology, in which the objective is to make the population, particularly those at risk, aware of the usefulness of early detection [16]. Our goal, to assess the probable interest of the population, is to identify whether there is a temporal association between the implementation of an awareness campaign, or preventive measures, and the number of internet searches among the population. Due to the inherent limits of this kind of epidemiological research, this would not reflect a direct causal link between public health policies and interest shown by the populations but rather a potential temporal association, making it possible to enrich the reflection on the efficiency of awareness campaigns [17].
The objective was to analyze a potential temporal association between public health programs and interest in oral cancers on the internet in the EU.
We analyzed the search data related to oral cancers from Google Trends©, the bibliometric analysis of scientific articles on the topic, Wikipedia page consultations, and articles published in the press and on Twitter© and cross checked them against the data on the introduction of anti-tobacco public health care programs in the EU.

Working outline and inclusion criteria
This observational retrospective bibliometric analysis used search data collected from Google©, Wikipedia© and Twitter© users in the 28 EU countries between January 1 st , 2004 and September 30th, 2018.
The data on press articles published and the bibliometric analysis of scientific articles during the same period were collected.
The temporal association between these results and the introduction of public health programs in the EU over the same period was studied.
On September 30, 2018, the countries included in this study had to be members of the EU, have an internet penetration rate (percentage of the population with internet access) of over 50%, have a Google© search engine usage rate of over 50% and participate in the MSC.

Identification of public health programs related to oral cancers in the EU
The oral cancer risk factor prevention awareness campaigns of each EU country included were researched alongside public data from the World Health Organization (WHO) and the EHNS.
A search was carried out on the WHO website (http:// www. euro. who. int) in the 'health topics' category. The health programs of each EU country were then identified using the ' Alcohol Use' , 'Oral Health' , 'Tobacco' , 'Vaccines and Immunization' and 'Human Papillomavirus and Cervical Cancer' pages.

Data collection
Data collection was standardized for each country.
A list of oral cancer clinical presentation key words that were compliant with the ICD-11 was created and translated into the EU's 24 official languages. For countries with more than one official language, the key words in each official language were listed (Additional file 1: Annex 1) [18].

a) Google Trends©
Search terms based on these keywords were entered into Google Trends© in the official language(s) of each included country to generate data linked to interest shown during the time period and in the geographical area studied.
The research was executed according to recommendations from Nuti et al. by entering each keyword as a 'search term' in the 'health' category [12].
Google Trends data do not provide an absolute value for interest in each search term. However, they do provide an index (relative search volume -RSV-) that refers to the number of searches completed for each term compared to the total number of searches completed on Google. This reported volume is scaled so that the maximum value on a given Google Trends search is 100.
Search terms that generated no data (RSV = 0) for a country were excluded from the analysis.

b) Wikipedia
The Wikipedia page view statistics for oral cancers in the languages of each of the included countries were collected from July 1, 2015 (date from which the statistics are publicly available), until September 30, 2018.
Data were collected using the same keywords as those previously used in the official language(s) of each country.
Pages whose view statistics were not available were not included.

iii) Twitter©
Public messages on Twitter© (tweets) about oral cancers between January 1, 2013 and September 30, 2018 were identified using a keyword search in the 24 official languages of the EU.
The start date was chosen empirically by the authors, considering that before 2013, this social network was not as widely used in the EU as in the period of 2013-2018. A preliminary search that found more than 100,000 tweets about oral cancers during this period confirmed the choice.
The keywords used were identical to those used for the collection of data on Google Trends© and Wikipedia.
The number of users posting tweets and the number of reactions ('retweets' and 'likes') were recorded.

iv) Europresse©
The Europresse database was used to evaluate media coverage of oral cancers.
A search of press articles was executed. The search area was limited to Europe between January 1, 2004, and September 30, 2018. The same search terms were used as those for Google Trends©, Wikipedia and Twitter©.

e) Bibliometric analysis
The Web of Science Core Collection and MEDLINE databases were used to complete a bibliometric analysis of scientific articles published between January 1, 2004, and December 31, 2018. The oral cancer key words were the same as those found in the MeSH (Medical Subject Headings), combined with the Boolean operator "OR": "mouth neoplasm", "mouth cancer", and "oral cancer".

Graphical and statistical methods of analysis
Using the data generated by Google Trends©, Wikipedia, Twitter and Europresse©, descriptive statistics and scatterplots were created for each search term with adjusted polynomial trendlines [12,[19][20][21].
Linear regression demonstrated the evolutionary trend of the bibliometric analysis of scientific articles.
To observe the relationship between 1 (searches carried out on Google©, Wikipedia, Twitter, Europresse) and 2 (the introduction of health care programs), the student's t test was implemented. Thus, one could compare the data linked to interest before and after the introduction of a health care program and assess the significance of its variations. Therefore, because the significance threshold is not impacted by a possible cumulative effect of the measurements over time, a Bonferroni correction was applied for each data collection.
Finally, Microsoft Excel© and MathWorks MATLAB© software were used to compare the search results from Google©, Wikipedia, Twitter, Europresse© and those from the bibliometric analysis using analysis of variance (ANOVA) and the Spearmann coefficient of linear correlation because the data not having a Gaussian distribution.
All statistical tests were used after verification of their application conditions.

Public health programs
Each prevention measures we identified was about tobacco: the introduction of health warnings and shock images on cigarette packets in Belgium (2006)

Google Trends©
Twenty EU countries were included, and eight countries (Cyprus, Greece, Hungary, Italy, Malta, Portugal, Romania and Sweden) were excluded due to a lack of usable data, particularly because of the impossibility of research on the four keywords for these countries.
In total, 43 searches in 17 languages for four keywords were completed: lip cancer (ten times), tongue cancer (thirteen times), gum cancer (six times) and mouth cancer (thirteen times).
We noted a general increase in the popularity of the search terms over the period studied, with an average increase in interest of 8.1% (mouth cancer: 14.2%; lip cancer: 8.3%; gum cancer: 5.5%; tongue cancer: 4.5%). Figure 1 shows the scatter plots and linear regression curves for the four search terms across all countries included.

Wikipedia
The statistics for Wikipedia page views connected to oral cancers were available in nine languages (Table 1).  Ninety-one percent of tweets about oral cancers were published in English. The 100 tweets with the most reactions were published using accounts with high numbers of followers and routinely relayed the oral cancer diagnosis of a public figure.

Press articles
Searches on Europresse revealed 787 articles in English, 735 in French and 392 in German (Fig. 4). Searches for articles in other languages did not return enough results to be useful.

Introduction of new public health care programs
The influence of the introduction of public health measures on the interest shown in oral cancers on Google©, Wikipedia and in the press is shown in Table 2.
The countries for which introduction of health warnings and shock images on cigarette packets were introduced before the enforcement of directive 2014/40/EU have been placed separately, so as not to include any bias in the Bonferroni correction.
A significant increase in Google© searches followed the introduction of health warnings on cigarette packets in Spain (P = 0.03), France (P = 0.001) and in the French-speaking part of Belgium (P = 0.02). These countries introduced health warnings before the 2014/40/ EU directive or the national or European prevention campaigns, and Bonferroni's correction could not be applied. We observed a significant increase in interest in oral cancers since the enforcement of directive 2014/40/ EU in Denmark (P < 0.001), Finland (p < 0.001), France (P = 0.01) and the United Kingdom (P < 0.001). A significant increase in search terms corresponding to "mouth cancers" was also seen in Germany (p < 0.001), but not in Bulgaria (p = 0.003) and the Czech Republic (p = 0.01) due to the significance level with Bonferroni correction p = 0.05/39 = 0.00128).
The MSC had no influence on Google© searches, except in Ireland (P < 0.001).
Interest shown in oral cancers has increased significantly in Ireland since the month of September when the week of awareness raising (P < 0.001) coincided with the MSC.
However, the introduction of MCAD (P = 0.03) in Ireland or the MCAM in the United Kingdom (P = 0.02) did not significantly increase the search volume in these countries due to the potential cumulative effect of the introduction of health warnings on cigarette packs ahead of these campaigns (significance level with Bonferroni correction p = 0.0008).
In contrast, the data obtained for Wikipedia searches showed a significant decrease in the number of average monthly visits after the enforcement of directive 2014/40/EU on the French (P < 0.001) and Portuguese (P < 0.001) pages.
There was no significant temporal association between MSCs and the number of Wikipedia page visits concerning oral cancers.
On Twitter, the number of tweets increased significantly in April (P < 0.001), as shown in the regular peaks seen in Fig. 5. There was no change during the MSC (P = 0.13).   The study of the temporal association between the introduction of public health care programs and the publication of articles in the press showed a significant increase in the number of publications about oral cancers during each awareness campaign.
Several peaks of interest common to several databases were observed in September, 2010, January and October, 2016, March, 2017 and January, 2018.
The analysis of the relationship between the ANOVA results and the calculation of the Spearman correlation (Table 3) showed that associations and correlations existed between our results.
There was no relationship between articles in the press and the number of Wikipedia page visits. Apart from this finding, the other results were positively associated. That is, the analyzed variables increased with one another.
A weak correlation was found between the publication of articles in the press about oral cancers and (1) the interest shown in them on Google© (0.11; P < 0.001) and (2) the publication of scientific articles (0.12; p < 0.001). We observed a weak correlation between the publication of scientific articles and interest shown in oral cancers on Google© (0.21; P < 0.001). Finally, a very strong correlation was found between the publication of scientific articles and (1) articles appearing in the press (0.8; P < 0.001) and (2) the number of tweets published (0.96; P < 0.001).

Discussion
Our study shows the weak temporal association between the introduction of public health programs and the interest shown in oral cancers on the internet in most EU countries.

Shock images and health warnings
These results reveal an increase in interest shown in oral cancers after the introduction of health warnings. It has already been shown that the type of explicit message associated with shock images impacts smokers [22][23][24][25].
These warnings could be accompanied by educational therapeutic medical information. In Canada and Australia, advice about how to quit smoking is printed on cigarette packets.

European campaigns versus national campaigns
We have demonstrated the weak temporal association in interest shown in oral cancers on the internet during the MSC, excluding that shown by the press. An upturn in interest, although not significant, in Europe was observed in only Ireland and the United Kingdom, which are both countries that organized their own awareness campaigns. These are not organized uniquely by scholarly societies but on a smaller scale by dedicated foundations and associations that include patients in their organizational structure.

The importance of social networks and celebrities
We highlighted the fact that the general population tends to follow the news rather than look for precise medical information.
Twitter posts that provoked the most reactions came from influential accounts. Celebrities can influence the wider public for a cause, at least for those paying attention to these "stars" [29][30][31][32].
Evans et al. described the "Angelina Jolie effect", noting a significant increase in breast cancer screening in the United States after the actress publicly announced her mastectomy in May, 2013 and called for more screening [33,34].
We also linked an interview with a former baseball star (Jim Kelly) in March of 2017, which called for Americans to be tested during the Oral, Head & Neck Cancer Awareness campaign (OHNCA) to a spike in internet searches.
Awareness of oral cancers could be raised by the collaboration of celebrities who could inform their fan base about the consequences of their life choices.

The influence of the names of prevention campaigns
The Wikipedia and Twitter© search tools include data from the United States in their English language search results since the algorithm does not allow for messages to be isolated or for searches by geographical area.
The United States organized an awareness campaign that appeared to generate an upturn in online interest. Oral cancers are clearly identified in the name of the campaign, such as those organized in Ireland and the United Kingdom, but in a way unlike the MSC.
A peak in interest in English databases in January, 2018, after the launch of the American screening campaign called, "Check Your Mouth ™ ", confirms this assessment. Figures 2 and 3 show a downward trend. This is the linear trend of the curve constructed from the monthly absolute values from Wikipedia and Twitter. It is decreasing. We have no specific explanation for this. It is very likely that this is because certain important values at the beginning of the period that we studied [2016 for Wikipedia (Fig. 1) and 2014 for Twitter (Fig. 2)] are responsible.

Alcohol and HPV
Our study did not account for these risk factors due to the absence of a European awareness-raising policy specific to them.
Regarding risk factors that can influence the choice of keywords to better "understand" cancer, we found that the use of the Google Trends' "Related queries" function had no influence on the search selection strategy.
Our bibliometric analysis showed an increase in scientific interest in oral cancers.
This increase was particularly significant in January, 2016 following the publication of the article by Agalliu et al. that demonstrated the role of HPV-16 in the pathogenesis of oral cancers [35].
However, Syrjänen et al. already demonstrated this 35 years ago, even if the scientific impact was not as important as that observed today [36][37][38].
Increases in the number of online searches were also seen in September, 2010 and October, 2016 after Michael Douglas was asked about oral cancer. The increase in cancers attributed to HPV infection, although localized in the throat, may therefore explain these peaks of interest in the population.
The population should be informed of the risk of oropharynx cancers connected to contamination by HPV. Health care professionals, in addition to mouth head and neck specialists, should also be informed, particularly about the benefits of vaccination against HPV [39][40][41].

Methodological tools
The objective of using several data sources was to increase the research's relevance, although more exhaustively, by limiting the bias linked to the source effect of a single database. This type of research was inevitably limited in scope; the multiplication of sources, unlike conventional epidemiological surveys, makes it possible to better reflect on the desired effect [42,43].
The use of Google and Wikipedia sources, which are common on the internet for information purposes, makes it possible to better identify the population's searches. The use of social networks such as Twitter, the most used network, is essential because it is as representative, if not currently more so, of trends and interest of the population on specific subjects, than any of the other networks [44,45].
Finally, the bibliometric analysis and the press articles make it possible to add another level to the research and, thus, to refine the answer to our initial hypothesis by more precisely reflecting the interest of society in this issue of the fight against oral cancer [46].
Unlike a typical epidemiological investigation, populations and subpopulations could not be identified. The geographical area and inclusion period were vast and were not identical for all the databases analysed. It was not possible to know if the same internet users carried out multiple searches or performed searches from different devices.
The bibliometric period studied coincided with a general worldwide increase in the number of scientific publications, which is not, however, a guarantee of quality and scientific rigor [50][51][52][53].
Public health policies were not collected exhaustively, and the interest shown in oral cancers was dependent on an internet connection. The causality between these two factors could therefore be criticized and remain controversial. There are many potential confounding factors that could explain a correlation (as the COVID-19 pandemic has further demonstrated with rapid public action and a concomitantly increased number of connections) [54].
Thus, only a temporal association between public health programs and an interest in oral cancers on the internet in the EU can be presented [55].
The choice of keywords, which can sometimes be similar between sites despite different search preferences and can sometimes be unknown to the general public, is an inevitable bias in this type of research and a limitation that should be accounted for in the conclusions.
Similarly, there may have been a bias in the selection and interpretation of the results concerning social networks. It is not proven that all users of these platforms can be influenced by celebrities. These "star people" may therefore have less influence than supposed.
It was not possible to individually analyze the data sources used, such as Twitter accounts. There may therefore be a confounding factor in the analysis between the Twitter accounts of private individuals and those of public organizations that may echo awareness campaigns.
The concrete impact of policy measures on the number of screenings, consultations, diagnoses or waiting times was not considered in our work.
The temporal resolution (by week that carried over to the month) at which the data were extracted and analyzed could sometimes be underestimated. Research has shown that news-related spikes in search and social media typically return to baseline after approximately three days. However, this method makes it possible to not account for the parasitic results of the "daily" without losing those that would have appeared in a few days [56].
Conversely, awareness campaigns are vocal in the organization of screening sessions. Although the proof is limited, a visual examination during a screening campaign could reduce the oral cancer mortality rate among high-risk patients.
These campaigns therefore seem important in raising awareness for a scientifically well-documented condition that remains relatively unknown to the general public [57].

Conclusion
We propose seven proposals that could reorient the awareness of oral cancer in Europe.